feat(swe): add Qwen3-30B SWE-bench async-GRPO recipe (vLLM + SGLang) by Kh4L · Pull Request #2961 · NVIDIA-NeMo/RL

Kh4L · 2026-06-26T21:05:29Z

What

Adds the multi-turn SWE-bench agentic async-GRPO recipe for Qwen3-30B-A3B-Thinking (MoE, 30B total / ~3B active):

examples/swe_bench/grpo_qwen3_30b_async_swe.yaml — the recipe config
examples/swe_bench/run_grpo_repro_baseline_swe2.sh — vLLM baseline launcher (reproduces the ~8%-resolved reference run)
examples/swe_bench/run_grpo_swe2_scale_gen.sh — generation-scaling sweep launcher with a BACKEND=vllm|sglang switch
examples/swe_bench/REPRO_swe2.md — vLLM baseline reproduction guide
examples/swe_bench/REPRO_swe2_sglang.md — SGLang reproduction guide

Why

Provides a reproducible reference for multi-turn SWE-bench RL (baseline ~8% resolved from step 1) and a working SGLang generation path at parity with vLLM — rollout completeness, throughput, and training-grade per-token logprob parity.

Status — draft, depends on #2447

The BACKEND=sglang path needs the enhanced SGLang backend (Megatron→SGLang MoE/PP weight-refit, router, fault-tolerance) from #2447, which is not yet merged. On current main's basic SGLang backend the SGLang path will not run the 30B-MoE recipe; the vLLM path is self-contained. Kept as a draft until #2447 lands. The companion gym-proxy token-splicing contiguity fix (required for multi-turn SGLang) is in NVIDIA-NeMo/Gym#1787.

Validation

Port parity: SGLang multi-turn rollouts 8/8, contiguity failures 0, ~193 gen tok/s with full CUDA graph (≈ vLLM).
Logprob parity (teacher-forced, 27,493 tokens): median |Δ| 1.38e-3; cross-engine median ≈ the within-engine bf16/MoE noise floor (1.24e-3) — i.e. vLLM differs from SGLang no more than SGLang differs from itself. Details in REPRO_swe2_sglang.md.

Reproduced end-to-end from a clean clone.

Signed-off-by: Serge Panev <spanev@nvidia.com>

copy-pr-bot · 2026-06-26T21:05:33Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Kh4L · 2026-06-26T21:54:24Z

Closing: this draft inadvertently contained internal cluster/filesystem paths. Will re-open a sanitized version.

feat(swe): add Qwen3-30B SWE-bench async-GRPO recipe (vLLM + SGLang)

aeb655e

Signed-off-by: Serge Panev <spanev@nvidia.com>

Kh4L mentioned this pull request Jun 26, 2026

fix(sglang): keep multi-turn prompts prefix-stable via token-splicing NVIDIA-NeMo/Gym#1787

Draft

Kh4L closed this Jun 26, 2026

Kh4L deleted the sglang-swe2-recipe branch June 26, 2026 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(swe): add Qwen3-30B SWE-bench async-GRPO recipe (vLLM + SGLang)#2961

feat(swe): add Qwen3-30B SWE-bench async-GRPO recipe (vLLM + SGLang)#2961
Kh4L wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
Kh4L:sglang-swe2-recipe

Kh4L commented Jun 26, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 26, 2026

Uh oh!

Kh4L commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Kh4L commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Status — draft, depends on #2447

Validation

Uh oh!

copy-pr-bot Bot commented Jun 26, 2026

Uh oh!

Kh4L commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kh4L commented Jun 26, 2026 •

edited

Loading